/*
 * Licensed to the Apache Software Foundation (ASF) under one or more
 * contributor license agreements.  See the NOTICE file distributed with
 * this work for additional information regarding copyright ownership.
 * The ASF licenses this file to You under the Apache License, Version 2.0
 * (the "License"); you may not use this file except in compliance with
 * the License.  You may obtain a copy of the License at
 *
 *   http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

/**
 * Code to maintain and access indices.
 *
 * <h2>Table Of Contents</h2>
 *
 * <ol>
 *   <li><a href="#index">Index APIs</a>
 *       <ul>
 *         <li><a href="#writer">IndexWriter</a>
 *         <li><a href="#reader">IndexReader</a>
 *         <li><a href="#segments">Segments and docids</a>
 *       </ul>
 *   <li><a href="#field_types">Field types</a>
 *       <ul>
 *         <li><a href="#postings-desc">Postings</a>
 *         <li><a href="#stored-fields">Stored Fields</a>
 *         <li><a href="#docvalues">DocValues</a>
 *         <li><a href="#points">Points</a>
 *       </ul>
 *   <li><a href="#postings">Postings APIs</a>
 *       <ul>
 *         <li><a href="#fields">Fields</a>
 *         <li><a href="#terms">Terms</a>
 *         <li><a href="#documents">Documents</a>
 *         <li><a href="#positions">Positions</a>
 *       </ul>
 *   <li><a href="#stats">Index Statistics</a>
 *       <ul>
 *         <li><a href="#termstats">Term-level</a>
 *         <li><a href="#fieldstats">Field-level</a>
 *         <li><a href="#segmentstats">Segment-level</a>
 *         <li><a href="#documentstats">Document-level</a>
 *       </ul>
 * </ol>
 *
 * <a id="index"></a>
 *
 * <h2>Index APIs</h2>
 *
 * <a id="writer"></a>
 *
 * <h3>IndexWriter</h3>
 *
 * <p>{@link org.apache.lucene.index.IndexWriter} is used to create an index, and to add, update and
 * delete documents. The IndexWriter class is thread safe, and enforces a single instance per index.
 * Creating an IndexWriter creates a new index or opens an existing index for writing, in a {@link
 * org.apache.lucene.store.Directory}, depending on the configuration in {@link
 * org.apache.lucene.index.IndexWriterConfig}. A Directory is an abstraction that typically
 * represents a local file-system directory (see various implementations of {@link
 * org.apache.lucene.store.FSDirectory}), but it may also stand for some other storage, such as RAM.
 * <a id="reader"></a>
 *
 * <h3>IndexReader</h3>
 *
 * <p>{@link org.apache.lucene.index.IndexReader} is used to read data from the index, and supports
 * searching. Many thread-safe readers may be {@link org.apache.lucene.index.DirectoryReader#open
 * open} concurrently with a single (or no) writer. Each reader maintains a consistent "point in
 * time" view of an index and must be explicitly refreshed (see {@link
 * org.apache.lucene.index.DirectoryReader#openIfChanged(DirectoryReader, IndexWriter)}) in order to
 * incorporate writes that may occur after it is opened. <a id="segments"></a>
 *
 * <h3>Segments and docids</h3>
 *
 * <p>Lucene's index is composed of segments, each of which contains a subset of all the documents
 * in the index, and is a complete searchable index in itself, over that subset. As documents are
 * written to the index, new segments are created and flushed to directory storage. Segments are
 * composed of an immutable core and per-commit live documents and doc-value updates. Insertions add
 * new segments. Deletions and doc-value updates in a given segment create a new segment that shares
 * the same core as the previous segment and new live docs for this segment. Updates are implemented
 * as an atomic insertion and deletion.
 *
 * <p>Over time, the writer merges groups of smaller segments into single larger ones in order to
 * maintain an index that is efficient to search, and to reclaim dead space left behind by deleted
 * (and updated) documents.
 *
 * <p>Each document is identified by a 32-bit number, its "docid," and is composed of a collection
 * of Field values of diverse types (postings, stored fields, term vectors, doc values, points and
 * knn vectors). Docids come in two flavors: global and per-segment. A document's global docid is
 * just the sum of its per-segment docid and that segment's base docid offset. External, high-level
 * APIs only handle global docids, but internal APIs that reference a {@link
 * org.apache.lucene.index.LeafReader}, which is a reader for a single segment, deal in per-segment
 * docids.
 *
 * <p>Docids are assigned sequentially within each segment (starting at 0). Thus the number of
 * documents in a segment is the same as its maximum docid; some may be deleted, but their docids
 * are retained until the segment is merged. When segments merge, their documents are assigned new
 * sequential docids. Accordingly, docid values must always be treated as internal implementation,
 * not exposed as part of an application, nor stored or referenced outside of Lucene's internal
 * APIs. <a id="field_types"></a>
 *
 * <h2>Field Types</h2>
 *
 * <a id="postings-desc"></a>
 *
 * <p>Lucene supports a variety of different document field data structures. Lucene's core, the
 * inverted index, is comprised of "postings." The postings, with their term dictionary, can be
 * thought of as a map that provides efficient lookup given a {@link org.apache.lucene.index.Term}
 * (roughly, a word or token), to (the ordered list of) {@link org.apache.lucene.document.Document}s
 * containing that Term. Codecs may additionally record {@link
 * org.apache.lucene.index.ImpactsEnum#getImpacts impacts} alongside postings in order to be able to
 * skip over low-scoring documents at search time. Postings do not provide any way of retrieving
 * terms given a document, short of scanning the entire index. <a id="stored-fields"></a>
 *
 * <p>Stored fields are essentially the opposite of postings, providing efficient retrieval of field
 * values given a docid. All stored field values for a document are stored together in a block.
 * Different types of stored field provide high-level datatypes such as strings and numbers on top
 * of the underlying bytes. Stored field values are usually retrieved by the searcher using an
 * implementation of {@link org.apache.lucene.index.StoredFieldVisitor}. <a id="termvectors"></a>
 *
 * <p>{@link org.apache.lucene.index.TermVectors} store a per-document inverted index. They are
 * useful for finding similar documents, called MoreLikeThis in Lucene. <a id="docvalues"></a>
 *
 * <p>{@link org.apache.lucene.index.DocValues} fields are what are sometimes referred to as
 * columnar, or column-stride fields, by analogy to relational database terminology, in which
 * documents are considered as rows, and fields, columns. DocValues fields store values per-field: a
 * value for every document is held in a single data structure, providing for rapid, sequential
 * lookup of a field-value given a docid. These fields are used for efficient value-based sorting,
 * for faceting, and sometimes for filtering on the least selective clauses of a query. <a
 * id="points"></a>
 *
 * <p>{@link org.apache.lucene.index.PointValues} represent numeric values using a kd-tree data
 * structure. Efficient 1- and higher dimensional implementations make these the choice for numeric
 * range and interval queries, and geo-spatial queries. <a id="knnvectors"></a>
 *
 * <p>{@link org.apache.lucene.index.KnnVectorValues} represent dense numeric vectors whose
 * dimensions may either be bytes or floats. They are indexed in a way that allows searching for
 * nearest neighbors. The vectors are typically produced by a machine-learned model, and used to
 * perform semantic search.
 *
 * <p><a id="postings"></a>
 *
 * <h2>Postings APIs</h2>
 *
 * <a id="terms"></a>
 *
 * <h3>Terms </h3>
 *
 * <p>{@link org.apache.lucene.index.Terms} represents the collection of terms within a field,
 * exposes some metadata and <a href="#fieldstats">statistics</a>, and an API for enumeration.
 *
 * <pre class="prettyprint">
 * Terms terms = leafReader.terms("body");
 * // metadata about the field
 * System.out.println("positions? " + terms.hasPositions());
 * System.out.println("offsets? " + terms.hasOffsets());
 * System.out.println("payloads? " + terms.hasPayloads());
 * // iterate through terms
 * TermsEnum termsEnum = terms.iterator();
 * BytesRef term = null;
 * while ((term = termsEnum.next()) != null) {
 *   doSomethingWith(term);
 * }
 * </pre>
 *
 * {@link org.apache.lucene.index.TermsEnum} provides an iterator over the list of terms within a
 * field, some <a href="#termstats">statistics</a> about the term, and methods to access the term's
 * <a href="#documents">documents</a> and <a href="#positions">positions</a>.
 *
 * <pre class="prettyprint">
 * // seek to a specific term
 * boolean found = termsEnum.seekExact(new BytesRef("foobar"));
 * if (found) {
 *   // get the document frequency
 *   System.out.println(termsEnum.docFreq());
 *   // enumerate through documents
 *   PostingsEnum docs = termsEnum.postings(null);
 *   // enumerate through documents and positions
 *   PostingsEnum docsAndPositions = termsEnum.postings(null, PostingsEnum.POSITIONS);
 * }
 * </pre>
 *
 * <a id="documents"></a>
 *
 * <h3>Documents </h3>
 *
 * <p>{@link org.apache.lucene.index.PostingsEnum} is an extension of {@link
 * org.apache.lucene.search.DocIdSetIterator} that iterates over the list of documents for a term,
 * along with the term frequency within that document.
 *
 * <pre class="prettyprint">
 * int docid;
 * while ((docid = docsEnum.nextDoc()) != DocIdSetIterator.NO_MORE_DOCS) {
 *   System.out.println(docid);
 *   System.out.println(docsEnum.freq());
 * }
 * </pre>
 *
 * <a id="positions"></a>
 *
 * <h3>Positions</h3>
 *
 * <p>PostingsEnum also allows iteration of the positions a term occurred within the document, and
 * any additional per-position information (offsets and payload). The information available is
 * controlled by flags passed to TermsEnum#postings
 *
 * <pre class="prettyprint">
 * int docid;
 * PostingsEnum postings = termsEnum.postings(null, PostingsEnum.PAYLOADS | PostingsEnum.OFFSETS);
 * while ((docid = postings.nextDoc()) != DocIdSetIterator.NO_MORE_DOCS) {
 *   System.out.println(docid);
 *   int freq = postings.freq();
 *   for (int i = 0; i &lt; freq; i++) {
 *     System.out.println(postings.nextPosition());
 *     System.out.println(postings.startOffset());
 *     System.out.println(postings.endOffset());
 *     System.out.println(postings.getPayload());
 *   }
 * }
 * </pre>
 *
 * <h3>Impacts</h3>
 *
 * <p>TermsEnum also allows returning an {@link org.apache.lucene.index.ImpactsEnum}, an extension
 * of PostingsEnum that exposes pareto-optimal tuples of (term frequency, length normalization
 * factor) per block of postings. It is typically used to compute the maximum possible score over
 * these blocks of postings, so that they can be skipped if they cannot possibly produce a
 * competitive hit.
 *
 * <pre class="prettyprint">
 * int docid;
 * ImpactsEnum impactsEnum = termsEnum.impacts(PostingsEnum.FREQS);
 * int targetDocID = 420;
 * impactsEnum.advanceShallow(targetDocID);
 * // These impacts expose pareto-optimal tuples of (termFreq, lengthNorm) over various ranges of doc IDs.
 * Impacts impacts = impactsEnum.getImpacts();
 * for (int level = 0; level &lt; impacts.numLevels(); i++) {
 *   int docIdUpTo = impacts.getDocIdUpTo(level);
 *   // List of pareto-optimal (termFreq, lengthNorm) tuples between targetDocID inclusive and docIdUpTo inclusive.
 *   List&lt;Impact&gt; perLevelImpacts = impacts.getImpacts(level);
 * }
 * </pre>
 *
 * <a id="stats"></a>
 *
 * <h2>Index Statistics</h2>
 *
 * <a id="termstats"></a>
 *
 * <h3>Term statistics </h3>
 *
 * <ul>
 *   <li>{@link org.apache.lucene.index.TermsEnum#docFreq}: Returns the number of documents that
 *       contain at least one occurrence of the term. This statistic is always available for an
 *       indexed term. Note that it will also count deleted documents, when segments are merged the
 *       statistic is updated as those deleted documents are merged away.
 *   <li>{@link org.apache.lucene.index.TermsEnum#totalTermFreq}: Returns the number of occurrences
 *       of this term across all documents. Like docFreq(), it will also count occurrences that
 *       appear in deleted documents.
 * </ul>
 *
 * <a id="fieldstats"></a>
 *
 * <h3>Field statistics </h3>
 *
 * <ul>
 *   <li>{@link org.apache.lucene.index.Terms#size}: Returns the number of unique terms in the
 *       field. This statistic may be unavailable (returns <code>-1</code>) for some Terms
 *       implementations such as {@link org.apache.lucene.index.MultiTerms}, where it cannot be
 *       efficiently computed. Note that this count also includes terms that appear only in deleted
 *       documents: when segments are merged such terms are also merged away and the statistic is
 *       then updated.
 *   <li>{@link org.apache.lucene.index.Terms#getDocCount}: Returns the number of documents that
 *       contain at least one occurrence of any term for this field. This can be thought of as a
 *       Field-level docFreq(). Like docFreq() it will also count deleted documents.
 *   <li>{@link org.apache.lucene.index.Terms#getSumDocFreq}: Returns the number of postings
 *       (term-document mappings in the inverted index) for the field. This can be thought of as the
 *       sum of {@link org.apache.lucene.index.TermsEnum#docFreq} across all terms in the field, and
 *       like docFreq() it will also count postings that appear in deleted documents.
 *   <li>{@link org.apache.lucene.index.Terms#getSumTotalTermFreq}: Returns the number of tokens for
 *       the field. This can be thought of as the sum of {@link
 *       org.apache.lucene.index.TermsEnum#totalTermFreq} across all terms in the field, and like
 *       totalTermFreq() it will also count occurrences that appear in deleted documents.
 * </ul>
 *
 * <a id="segmentstats"></a>
 *
 * <h3>Segment statistics </h3>
 *
 * <ul>
 *   <li>{@link org.apache.lucene.index.IndexReader#maxDoc}: Returns the number of documents
 *       (including deleted documents) in the index.
 *   <li>{@link org.apache.lucene.index.IndexReader#numDocs}: Returns the number of live documents
 *       (excluding deleted documents) in the index.
 *   <li>{@link org.apache.lucene.index.IndexReader#numDeletedDocs}: Returns the number of deleted
 *       documents in the index.
 * </ul>
 *
 * <a id="documentstats"></a>
 *
 * <h3>Document statistics </h3>
 *
 * <p>Document statistics are available during the indexing process for an indexed field: typically
 * a {@link org.apache.lucene.search.similarities.Similarity} implementation will store some of
 * these values (possibly in a lossy way), into the normalization value for the document in its
 * {@link org.apache.lucene.search.similarities.Similarity#computeNorm} method.
 *
 * <ul>
 *   <li>{@link org.apache.lucene.index.FieldInvertState#getLength}: Returns the number of tokens
 *       for this field in the document. Note that this is just the number of times that {@link
 *       org.apache.lucene.analysis.TokenStream#incrementToken} returned true, and is unrelated to
 *       the values in {@link
 *       org.apache.lucene.analysis.tokenattributes.PositionIncrementAttribute}.
 *   <li>{@link org.apache.lucene.index.FieldInvertState#getNumOverlap}: Returns the number of
 *       tokens for this field in the document that had a position increment of zero. This can be
 *       used to compute a document length that discounts artificial tokens such as synonyms.
 *   <li>{@link org.apache.lucene.index.FieldInvertState#getPosition}: Returns the accumulated
 *       position value for this field in the document: computed from the values of {@link
 *       org.apache.lucene.analysis.tokenattributes.PositionIncrementAttribute} and including {@link
 *       org.apache.lucene.analysis.Analyzer#getPositionIncrementGap}s across multivalued fields.
 *   <li>{@link org.apache.lucene.index.FieldInvertState#getOffset}: Returns the total character
 *       offset value for this field in the document: computed from the values of {@link
 *       org.apache.lucene.analysis.tokenattributes.OffsetAttribute} returned by {@link
 *       org.apache.lucene.analysis.TokenStream#end}, and including {@link
 *       org.apache.lucene.analysis.Analyzer#getOffsetGap}s across multivalued fields.
 *   <li>{@link org.apache.lucene.index.FieldInvertState#getUniqueTermCount}: Returns the number of
 *       unique terms encountered for this field in the document.
 *   <li>{@link org.apache.lucene.index.FieldInvertState#getMaxTermFrequency}: Returns the maximum
 *       frequency across all unique terms encountered for this field in the document.
 * </ul>
 *
 * <p>Additional user-supplied statistics can be added to the document as DocValues fields and
 * accessed via {@link org.apache.lucene.index.LeafReader#getNumericDocValues}.
 */
package org.apache.lucene.index;
